library(tidyverse)

While ggplot2 comes with a lot of batteries included, the extension ecosystem provides priceless additinal features

Plot composition

Here we use the patchwork, but note that cowplot is also a popular alternative.

We start by creating 3 separate plots

data("msleep", package = "ggplot2")

p1 <- ggplot(msleep) + 
  geom_boxplot(aes(x = sleep_total, y = vore, fill = vore))
p1

p2 <- ggplot(msleep) + 
  geom_bar(aes(y = vore, fill = vore))
p2

p3 <- ggplot(msleep) + 
  geom_point(aes(x = bodywt, y = sleep_total, colour = vore)) + 
  scale_x_log10()
p3

Combining them with patchwork is a breeze using the different operators

library(patchwork)
p1 + p2 + p3

p_all <- (p1 | p2) / p3
p_all

p_all + plot_layout(guides = 'collect')

p_all & theme(legend.position = 'none') ## new operator, operates on all plots

p_all <- p_all & theme(legend.position = 'none')

p_all + plot_annotation(
  title = 'Mammalian sleep patterns',
  tag_levels = 'A'
)

Excercises

Patchwork will assign the same amount of space to each plot by default, but this can be controlled with the widths and heights argument in plot_layout(). This can take a numeric vector giving their relative sizes (e.g. c(2, 1) will make the first plot twice as big as the second). Modify the code below so that the middle plot takes up half of the total space:

p <- ggplot(mtcars) + 
  geom_point(aes(x = disp, y = mpg))

p + p + p

p + p + p + plot_layout(widths = c(1, 2, 1))


The & operator can be used with any type of ggplot2 object, not just themes.

Modify the code below so the two plots share the same y-axis (same limits)

p1 <- ggplot(mtcars[mtcars$gear == 3,]) + 
  geom_point(aes(x = disp, y = mpg))

p2 <- ggplot(mtcars[mtcars$gear == 4,]) + 
  geom_point(aes(x = disp, y = mpg))

p1 + p2

p1 + p2 & coord_cartesian(ylim = c(0, 40))


Patchwork contains many features for fine tuning the layout and annotation. Very complex layouts can be obtained by providing a design specification to the design argument in plot_layout(). The design can be defined as a textual representation of the cells. Use the layout given below. How should the textual representation be understood?

p1 <- ggplot(mtcars) + 
  geom_point(aes(x = disp, y = mpg))
p2 <- ggplot(mtcars) + 
  geom_bar(aes(x = factor(gear)))
p3 <- ggplot(mtcars) + 
  geom_boxplot(aes(x = factor(gear), y = mpg))

layout <- '
AA#
#BB
C##
'
p1 + p2 + p3 + plot_layout(design = layout)

layout <- '
11#
112
332
##2
'

p1 + p2 + p3 + plot_layout(design = layout)

Animation

ggplot2 is usually focused on static plots, but gganimate extends the API and grammar to describe animations. As such it feels like a very natural extension of using ggplot2.

ggplot(economics) + 
  geom_line(aes(x = date, y = unemploy))

library(gganimate)

ggplot(economics) + 
  geom_line(aes(x = date, y = unemploy)) + 
  transition_reveal(along = date)

There are many different transitions that control how data is interpreted for animation, as well as a range of other animation specific features.

ggplot(mpg) + 
  geom_bar(aes(x = factor(cyl)))

ggplot(mpg) + 
  geom_bar(aes(x = factor(cyl))) + 
  labs(title = 'Number of cars in {closest_state} by number of cylinders') + 
  transition_states(states = year) + 
  enter_grow() + 
  exit_fade()

Exercises

The animation below will animate between points showing cars with different cylinders.

ggplot(mpg) + 
  geom_point(aes(x = displ, y = hwy)) + 
  ggtitle("Cars with {closest_state} cylinders") +  ## string interpolation with glue
  transition_states(factor(cyl))

gganimate uses the group aesthetic to match observations between states. By default the group aesthetic is set to the same value, so observations are matched by their position (first row of 4 cyl is matched to first row of 5 cyl etc.). This is clearly wrong here (why?). Add a mapping to the group aesthetic to ensure that points do not move between the different states.

ggplot(mpg) + 
  geom_point(aes(x = displ, y = hwy, group = factor(cyl))) + 
  ggtitle("Cars with {closest_state} cylinders") +  ## string interpolation with glue
  transition_states(factor(cyl))


In the presence of discrete aesthetic mappings (colour below), the group is deduced if not given. The default behaviour of objects that appear and disappear during the animation is to simply pop in and out of existance. enter_*() and exit_*() functions can be used to control this behaviour. Experiment with the different enter and exit functions provided by gganimate below. What happens if you add multiple enter or exit functions to the same animation?

ggplot(mpg) + 
  geom_point(aes(x = displ, y = hwy, color = factor(cyl))) + 
  ggtitle("Cars with {closest_state} cylinders") + 
  transition_states(factor(cyl)) +
  enter_fade() + 
  exit_shrink()


In the animation below (as in all the other animations) the changes happens at constant speed. How values change during an animation is called easing and can be controlled using the ease_aes() function. Read the documentation for ease_aes() and experiment with different easings in the animation.

mpg2 <- tidyr::pivot_longer(mpg, c(cty,hwy))

ggplot(mpg2) + 
  geom_point(aes(x = displ, y = value)) + 
  ggtitle("{if (closest_state == 'cty') 'Efficiency in city' else 'Efficiency on highway'}") + 
  transition_states(name) +
  ease_aes("bounce-in-out")

ggplot(mpg2) + 
  geom_point(aes(x = displ, y = value)) + 
  ggtitle("{if (closest_state == 'cty') 'Efficiency in city' else 'Efficiency on highway'}") + 
  transition_states(name) +
  ease_aes("elastic-out")

Annotation

Text is a huge part of storytelling with your visualisation. Historically, textual annotations has not been the best part of ggplot2 but new extensions make up for that.

Standard geom_text will often result in overlaping labels

ggplot(mtcars, aes(x = disp, y = mpg)) + 
  geom_point() + 
  geom_text(aes(label = row.names(mtcars)))

ggrepel takes care of that

library(ggrepel)

ggplot(mtcars, aes(x = disp, y = mpg)) + 
  geom_point() + 
  geom_text_repel(aes(label = row.names(mtcars)))

If you want to highlight certain parts of your data and describe it, the geom_mark_*() family of geoms have your back

library(ggforce)
ggplot(mtcars, aes(x = disp, y = mpg)) +
  geom_point() + 
  geom_mark_ellipse(aes(filter = gear == 4,
                        label = '4 gear cars',
                        description = 'Cars with fewer gears tend to both have higher yield and lower displacement'))

Exercises

ggrepel has a ton of settings for controlling how text labels move. Often, though, the most effective is simply to not label everything. There are two strategies for that: Either only use a subset of the data for the repel layer, or setting the label to "" for those you don’t want to plot. Try both in the plot below where you only label 10 random points.

mtcars2 <- mtcars
mtcars2$label <- rownames(mtcars2)
points_to_label <- sample(nrow(mtcars), 10)

ggplot(mtcars2, aes(x = disp, y = mpg)) + 
  geom_point() + 
  geom_text_repel(data = mtcars2[points_to_label, ],aes(label = label))

mtcars2$label[-points_to_label] <- ""

ggplot(mtcars2, aes(x = disp, y = mpg)) + 
  geom_point() + 
  geom_text_repel(aes(label = label))


Explore the documentation for geom_text_repel. Find a way to ensure that the labels in the plot below only repels in the vertical direction

ggplot(mtcars2, aes(x = disp, y = mpg)) + 
  geom_point() + 
  geom_text_repel(aes(label = label), direction = "y")


ggforce comes with 4 different types of mark geoms. Try them all out in the code below:

ggplot(mtcars, aes(x = disp, y = mpg)) +
  geom_point() + 
  geom_mark_ellipse(aes(filter = gear == 4, label = '4 gear cars'))

ggplot(mtcars, aes(x = disp, y = mpg)) +
  geom_point() + 
  geom_mark_circle(aes(filter = gear == 4, label = '4 gear cars'))

ggplot(mtcars, aes(x = disp, y = mpg)) +
  geom_point() + 
  geom_mark_hull(aes(filter = gear == 4, label = '4 gear cars'), concavity = 10)

ggplot(mtcars, aes(x = disp, y = mpg)) +
  geom_point() + 
  geom_mark_rect(aes(filter = gear == 4, label = '4 gear cars'))

In the future, the ggtext will make styling text with markdown and css like syntax relatively easy.

Networks

ggplot2 has been focused on tabular data. Network data in any shape and form is handled by ggraph.

library(ggraph)
library(tidygraph)

graph <- create_notable('zachary') %>% 
  mutate(clique = as.factor(group_infomap()))

graph
# A tbl_graph: 34 nodes and 78 edges
#
# An undirected simple graph with 1 component
#
# Node Data: 34 x 1 (active)
  clique
  <fct> 
1 2     
2 2     
3 2     
4 2     
5 3     
6 3     
# … with 28 more rows
#
# Edge Data: 78 x 2
   from    to
  <int> <int>
1     1     2
2     1     3
3     1     4
# … with 75 more rows
ggraph(graph) + 
  geom_mark_hull(aes(x, y, fill = clique)) + 
  geom_edge_link() + 
  geom_node_point(size = 2)
Using `stress` as default layout

dendrograms are just a specific type of network

iris_clust <- hclust(dist(iris[, 1:4]))

ggraph(iris_clust) + 
  geom_edge_bend() + 
  geom_node_point(aes(filter = leaf))
Using `dendrogram` as default layout

Exercies

Most network plots are defined by a layout algorithm, which takes the network structure and calculate a position for each node. The layout algorithm is global and set in the ggraph(). The default auto layout will inspect the network object and try to choose a sensible layout for it (e.g. dendrogram for a hierarchical clustering as above). There is, however no optimal layout and it is often a good idea to try out different layouts. Try out different layouts in the graph below. See the the website for an overview of the different layouts.

ggraph(graph, layout = "kk") + 
  geom_edge_link() + 
  geom_node_point(aes(colour = clique), size = 3)

ggraph(graph, layout = "eigen") + 
  geom_edge_link() + 
  geom_node_point(aes(colour = clique), size = 3)

ggraph(graph, layout = "backbone") + 
  geom_edge_link() + 
  geom_node_point(aes(colour = clique), size = 3)


There are many different ways to draw edges. Try to use geom_edge_parallel() in the graph below to show the presence of multiple edges

highschool_gr <- as_tbl_graph(highschool)

highschool_gr
# A tbl_graph: 70 nodes and 506 edges
#
# A directed multigraph with 1 component
#
# Node Data: 70 x 1 (active)
  name 
  <chr>
1 1    
2 2    
3 3    
4 4    
5 5    
6 6    
# … with 64 more rows
#
# Edge Data: 506 x 3
   from    to  year
  <int> <int> <dbl>
1     1    13  1957
2     1    14  1957
3     1    20  1957
# … with 503 more rows
ggraph(highschool_gr) + 
  geom_edge_parallel() + 
  geom_node_point()
Using `stress` as default layout

ggraph(highschool_gr) + 
  geom_edge_fan() + 
  geom_node_point()
Using `stress` as default layout

Faceting works in ggraph as it does in ggplot2, but you must choose to facet by either nodes or edges. Modify the graph below to facet the edges by the year variable (using facet_edges())

ggraph(highschool_gr) + 
  geom_edge_fan() + 
  geom_node_point() + 
  facet_edges(~year)
Using `stress` as default layout

Looks

Many people have already desgned beautiful (and horrible) themes for you. Use them as a base.

p <- ggplot(mtcars, aes(mpg, wt)) +
  geom_point(aes(color = factor(carb))) +
  labs(
    x = 'Fuel efficiency (mpg)', 
    y = 'Weight (tons)',
    title = 'Seminal ggplot2 example',
    subtitle = 'A plot to show off different themes',
    caption = 'Source: It’s mtcars — everyone uses it'
  )
library(hrbrthemes)

p + scale_colour_ipsum() + 
    theme_ipsum()

library(ggthemes)
p + scale_colour_excel() + 
    theme_excel()

Drawing anything

states <- c(
  'eaten', "eaten but said you didn\'t", 'cat took it', 'for tonight',
  'will decompose slowly'
)

pie <- data.frame(
  state = factor(states, levels = states),
  amount = c(4, 3, 1, 1.5, 6),
  stringsAsFactors = FALSE
)

ggplot(pie) + 
  geom_col(aes(x = 0, y = amount, fill = state))

ggplot(pie) + 
  geom_col(aes(x = 0, y = amount, fill = state)) + 
  coord_polar(theta = 'y')

ggplot(pie) + 
  geom_col(aes(x = 0, y = amount, fill = state)) + 
  coord_polar(theta = 'y') + 
  scale_fill_tableau(name = NULL, guide = guide_legend(ncol = 2)) + 
  theme_void() + 
  theme(legend.position = 'top')

ggplot(pie) + 
  geom_arc_bar(aes(x0 = 0, y0 = 0, r0 = 0, r = 1, amount = amount, fill = state), stat = 'pie') + 
  coord_fixed()

ggplot(pie) + 
  geom_arc_bar(aes(x0 = 0, y0 = 0, r0 = 0, r = 1, amount = amount, fill = state), stat = 'pie') + 
  coord_fixed() + 
  scale_fill_tableau(name = NULL,
                     guide = guide_legend(ncol = 2)) + 
  theme_void() + 
  theme(legend.position = 'top', 
        legend.justification = 'left')